Keyword [ChestX-ray14]
Wang X, Peng Y, Lu L, et al. Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 9049-9058.
1. Overview
In this paper, it proposes TieNet (Text-Image Embedding Network)
- CNN-RNN
- Multi-level attention
- highlight the meaning full text words and image regions
- generate reporting
- paired text-image representation from training
- two enhancement:
- AETE. attention-encoded text embedding
- SW-GAP. saliency weighted global average pooling
1.1. Related Work
- image caption
1.2. Task Type
1.2.1. Medical Image Auto-Annotation
- ommit the generation of sequential words
- BP only for classification loss
1.2.2. Automatic Classification and Reporting of Thorax Disease
- training. image + report
- testing. only image
1.3. Architecture
1.3.1. CNN
word embedding. (T, d_w)
output of transition layer. X (D, D, C), D=16, C=1024
1.3.2. RNN
- phi(X) map X toget h_0. d_x to d_h
- Input.
- w. previous generated word
- a. previous generated weight
1.3.3. Attention Text Enhancement
- G. weights (r, T)
- H. (d_h, T)
- W_s1 (s, d_h)
- W_s2 (r, s)
- r. the number of global attention
- M. embedding matrix (r, d_h)
- execute max-over-r pooling across M to highlight word
1.3.4. Saliency Weighted Global Average Pooling
- reuse G to highlight region
1.3.5. Joint Training
- concate then use FC to predict classification
1.3.6. Details
- classification. 15 length
- word2vec. 200 dimension
- 15,427 words appear at least twice. out-of-vocabulary token, start token, end token
- LSTM 256 cell, 350 unit, s=2000, r=5
- α=1
- 0.5 dropout, 1e-4 for L2 regularization
- 1e-3 LR, Adam
- balanced loss.
- β. image with at least one disease and no disease
- λ. image with and without certain disease
- Loss
- L_R. RNN loss
- L_C. classification loss
2. Experiments
2.1. Dataset
- ChestX-ray14
- Hand-labeled
- OpenI